FFmpeg & MP4

FFmpeg is a very adaptable tool which can take pretty much any stream and output it as pretty much any other. It can take multiple streams and merge them or it can split a single stream into multiple. This document will focus on how to use FFmpeg to create MP4 files (and associated stuff) for streaming.

Before any flame wars are ignited, I believe that MKV is the superior format. MP4 is limited as to the stream types which are supported but has been around for a long time so can be played by almost anything with a screen. Conversely, MKV is able to hold pretty much anything but probably won't work on your TV, phones or tablets. If you want to archive media with minimal loss, go for MKV. If you want to stream media to pretty much any device, use MP4.

If you need a refresher on Container Formats and Stream Codecs, Google it!

I use MakeMKV to rip DVDs and BluRays, because I'm lazy and it does quite a good job.

To convert any* file to an .mp4 file without specifying any options:

ffmpeg -i input.mkv output.mp4

FFmpeg will automatically transcode any* stream so that it best meets the specifications for MP4

With the command above, you would have a standards-compliant MP4 which should work with any device that supports MP4 but, if you were to attempt to access this over a network, you would have to download the entire file before being able to play it.

To be able to play an MP4 immediately, you must move the "MOOV Atom" to the beginning of the file. This can be achieved with the following command:

ffmpeg -i input.mp4 -movflags +faststart output.mp4

Many container formats support multiple audio and subtitle streams. Not all MP4 players support a file containing multiple audio streams and some don't even support opening a file with an embedded subtitle track.

It's possible to choose which input video, audio and subtitle stream are included (or not) in the output MP4 using the 'map' option:

ffmpeg -i input.mkv -map 0:v:0 -map 0:a:1 -map 0:s:3 output.mp4

This will 'map' the first video stream (starting from 0), the second audio stream and the fourth subtitle stream from the input container.

It is unusual to have multiple video streams but not impossible. It is common to have audio streams in different languages and to have audio description and commentary tracks. It is also common to have subtitles in different languages and to have descriptive text for the hard of hearing.

It is possible to include video, audio and subtitles from entirely seperate files:

ffmpeg -i input.mkv -i input.flac -i input.sub output.mp4

In this case, ffmpeg would include all of the input streams in the output MP4.

If you would like to carefully choose which input streams were included and in what order, see the Advanced Mapping section below.

So, you want to convert an MKV into an MP4 but extract the Subtitles?

ffmpeg -i input.mkv output.srt -sn output.mp4

The -sn option blocks all subtitle streams.

This feature is more managable with Advanced Mapping and/or filter_complex.

I prefer to do this in two passes, one for each output file, which seems quicker in my experience

To strip ALL metadata; title, tags, encoder information and chapters:

ffmpeg -i input.mkv -map_metadata -1 -fflags +bitexact -flags:v +bitexact -flags:a +bitexact output.mp4

The -map_metadata -1 option removes all global (container) metadata and stream metadata.

The -fflags +bitexact -flags:v +bitexact -flags:a +bitexact options remove encoder information from the container, video streams and audio streams.

To strip metadata while retaining chapters:

ffmpeg -i input.mkv -map_metadata:s:0 -1 -map_metadata:p:0 -1 -fflags +bitexact -flags:v +bitexact -flags:a +bitexact output.mp4

The -map_metadata:s:0 -1 -map_metadata:p:0 -1 options remove metadata from the program (container) and streams.

Note: This method will remove the 'title' metadata tag which is required by some media players. To add a 'title' tag, see Set Output Metadata below.

It's possible to add metadata 'tags' to the container (global) to the program, chapters or specific streams:

ffmpeg -i input.mkv -metadata:g title="Output" -metadata:s:a language="eng" -metadata:c:0 title="Intro" output.mp4

The -metadata:g title="Output" option sets the global (container) 'title' tag. Global is the default target so :g could be ommitted.

The -metadata:s:a language="eng" option sets the 'language' tag on all audio streams. This could be further refined by specifying a stream index.

The -metadata:c:0 title="Intro" option sets the 'title' tag of the first chapter (index starts at 0).

These options override -map_metadata.

Metadata is automatically copied from the first input container if nothing else is specified.

FFmpeg will automatically use the AVC codec for video streams, the AAC codec for audio streams and the mov_text codec for subtitle streams when outputting to MP4.

To output an MP4 with HEVC video, FLAC audio and WebVTT subtitles:

ffmpeg -i input.mkv -vcodec libx265 -acodec flac -scodec webvtt output.mp4

The -vcodec libx265 option outputs HEVC (H.265) using an old switch which is an alias to -c:v.

The -acodec flac option outputs FLAC using an old switch which is an alias to -c:a.

The -scodec webvtt option outputs WebVTT using an old switch which is an alias to -c:s.

Using the new switches, ie. -c:a, allows you to use stream specifiers. The old switch applies to all streams of a type, audio streams in this case.

It should be noted that this MP4 will not play on most media players. While HEVC might be supported on newer devices, FLAC doesn't seem to be supported by anything except software. I have not tested device support for WebVTT.

Some devices (mostly very old or obsolete) only support the more limited Constrained Baseline or Main profiles.

To specify 'main' profile:

ffmpeg -i input.mkv -c:v libx264 -profile:v high -level:v 4.0 output.mp4

Most modern devices support the more advanced High profile.

Apple's Quicktime only supports *baseline* and *main* profiles and it only supports the 420 colorspace. https://trac.ffmpeg.org/wiki/Encode/VFX

iOS devices do not support 10-bit H.264. https://www.reddit.com/r/VIDEOENGINEERING/comments/sumja3/best_h264_profile_for_web_streaming_on_browsers/

A lot of devices only support H.264 High Level 4.0, 4.1 or 4.2. https://docs.veeplay.com/docs/video-guides/video-codec-types-device-support/

All media playing systems can be expected to be able to reproduce stereo audio. Most will not have a surround sound speaker setup so, the media player will need to mix surround streams into stereo. Many surround to stereo mixing algorithms are imperfect, resulting in quiet speach or reduced bass.

A discussion of the best ways to mix surround to stereo can be found here: https://superuser.com/questions/852400/properly-downmix-5-1-to-stereo-using-ffmpeg

ffmpeg -i input.mkv -c:a libfdk_aac -b:a 128k -filter:a "volume=1.66,pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" output.mp4

The options -c:a, -b:a and -filter:a can use a stream index for when there are multiple audio streams.

64kbps per stream channel is recommended, 2 channels in a stereo stream is 128kbps. Citation needed.

The filter can take channel index numbers instead of names but I find it easier to work with named channels.

In general, it's better not to scale or letterbox. This permanently alters the video. It's far better to allow the media player to do this on-the-fly instead. Certainly, scaling up reduces the quality of the video more than scaling down.

ffmpeg -i input.mkv -filter:v "scale=(iw*sar)*min(1280/(iw*sar)\,720/ih):ih*min(1280/(iw*sar)\,720/ih), pad=1280:720:(1280-iw*min(1280/iw\,720/ih))/2:(720-ih*min(1280/iw\,720/ih))/2" output.mp4

The above command will scale the image to fit 1280x720, without cropping, while maintining the aspect ratio. It will then center and pad the image to completely fill 1280x720.

Whether you want to omit a section at the beginning or end of the source video, or if you want to create an output video of specific length:

ffmpeg -i input.mkv -ss 00:10:00 -to 00:20:00 output.mp4

The command above will start the source video at 10 minutes and proceed to copy/transcode 10 minutes to the output file.

Use Constant Rate Factor to target a certain quality usually resulting in a larger file.

Use Two Pass Encoding to target a certain bitrate and/or a certain file size.

ffmpeg -y -i input -c:v libx264 -b:v 2600k -pass 1 -an -f null /dev/null && \
ffmpeg -i input -c:v libx264 -b:v 2600k -pass 2 -c:a aac -b:a 128k output.mp4

With either of these methods you can use preset, tune and profile.

Recommendations for video bitrate are wide ranging. For 1080p @ 30fps the recommendations start at 4Mbps all the way up to 10Mbps. The list below is what Netflix uses:

1080p HD:	5 Mbps
720p HD:	3 Mbps
480p SD:	1 Mbps

MPEG-H HEVC (H.265)
MPEG-4 AVC (H.264)
MPEG-4 Visual
MPEG-2 Video
MPEG-1 Video
H.263

VC-1

M-JPEG
JPEG 2000

MPEG-1 Audio Layer 1
MPEG-1 Audio Layer 2
MPEG-1 Audio Layer 3
AAC

AC-3
E-AC-3
Dolby TrueHD

DTS
DTS-HD

Opus
FLAC
ALAC
MLP
ALS
SLS
LPCM
DV Audio
AMR

WebVTT
TTXT
VobSub
SubRip as TTXT
PGS as VobSub

Technically, MP4 can contain a stream of nearly any type using Private Streams. That said, FFmpeg may not always help you do that.

MP4 has quite a good list of official codecs but, this doesn't mean that ALL mediaplayers will support it.

Common media player devices have a fairly short list of supported codecs. Old media player software often has an even shorter list.

The HTML5 Video element basically only supports AVC/AAC/WebVTT. This is down to web browser compatibility, the HTML Video element itself could support any codec and WhatWG does suggest others.

According to Ffmpeg Documentation - H.264 Video Encoding Guide, "dumb players" only support the YUV planar color space with 4:2:0 chroma subsampling for H.264 video. You may need to use -vf format=yuv420p (or the alias -pix_fmt yuv420p) for your output to work.

According to superuser - What are the differences between H.264 Profiles?, it is best to use the H.264 "Main" profile for web streaming. Most devices that have a screen, and are capable of playing an .mp4 file, support the "Main" profile.

According to Wikipedia - Advanced Video Coding, for 1080p @ 30fps you can use H.264 Main Level 4. Level 4 is the minimum required decoder performance for 1080p @ 30fps, Level 3.1 for 720p @ 30fps and Level 3.0 for 480p @ 30fps.

Progressive is better than Interlaced. According to Wikipedia - Interlaced video, Interlaced use two fields to render a frame whereas only a single field is required with Progressive.

Frame Rate doesn't really matter too much any more; 24fps, 25fps and 30fps are common and all work fine on any device you're likely to use.

FFmpeg & MP4

The Basics

Optimise for Streaming

Stream Mapping

Multiple Inputs

Multiple Outputs

Strip Input Metadata

Set Output Metadata

Specify Codec

Specify h264 Profile

Mix Surround into Stereo

Scale and Letterbox

Start and Stop

Two Pass Encoding

Streaming Bitrates

Compatible Video Codecs

Compatible Audio Codecs

Compatible Subtitle Codecs

Pertinent Metadata Tags

Ultimate Compatibility